A Greedy Correlation Measure Based Attribute Clustering Algorithm for Gene Selection

نویسندگان

  • Jiucheng Xu
  • Yun-peng Gao
  • Shuangqun Li
  • Lin Sun
  • Tianhe Xu
  • Jinyu Ren
چکیده

This paper proposes an attribute clustering algorithm for grouping attributes into clusters so as to obtain meaningful modes from microarray data. First the problem of attribute clustering is analyzed and neighborhood mutual information is introduced to solve it. Furthermore, an attribute clustering algorithm is presented for grouping attributes into clusters through optimizing a criterion function which is derived from an information measure that reflects the correlation between attributes. Then, by applying this method to gene expression data, meaningful clusters are discovered which assists to capture aspects of gene association patterns. Thus, significant genes containing useful information for gene classification and identification are selected. In the following, the proposed algorithm is employed to six gene expression data sets and a comparison is made with several well-known gene selection methods. Experiments show that the greedy correlation measure based attribute clustering algorithm, noted as GCMACA, is more capable of discovering meaningful clusters of genes. Through selecting a subset of genes which have a high significant multiple correlation value with others within clusters, informative genes can be acquired and gene expression of different categories can be identified as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

Modification in Weighted Clustering Algorithm for Faster Clustering Formation by Considering Absolute Attributes of Mobile Nodes and Greedy Method for Role Selection of Mobile Nodes in MANET

Wirelessly allocated mobile nodes which are configured in a geographically adjacent to each other can form cluster by applying rules on the mobile nodes. Each cluster family have different members with different assigned roles such as cluster head, cluster members, gateway members and ordinary nodes which can perform roles of any three mentioned roles as the time progresses based on absolute an...

متن کامل

Consistency-preserving attribute reduction in fuzzy rough set framework

Attribute reduction (feature selection) has become an important challenge in areas of pattern recognition, machine learning, data mining and knowledge discovery. Based on attribute reduction, one can extract fuzzy decision rules from a fuzzy decision table. As consistency is one of several criteria for evaluating the decision performance of a decision-rule set, in this paper, we devote to prese...

متن کامل

The ensemble clustering with maximize diversity using evolutionary optimization algorithms

Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JCP

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013